class: center, middle, inverse, title-slide .title[ #
Drawing causal diagrams
] .author[ ### Christoph Hanck ] .date[ ### Summer 2023 ] --- layout: true <a style="position: absolute;top:5px;left:10px;color:#004c93;" target="Overview" href="https://kaslides.netlify.app/">
</a> --- ## Recap of causal diagrams Causal diagrams are representations of the DGP that got us our data. Causal diagrams help us in answering research questions.Hence, drawing our own causal diagram will come down to putting our idea of what the DGP is onto paper (or a computer screen). <br> <div class="figure">
<p class="caption">2: A simple causal diagram</p> </div> --- ## Causal diagrams **Assumptions for causal diagrams** - Assumptions about the data generating process are very important for causal diagrams. - The quality of our research will hinge on how accurate those assumptions are. - Think about whether assumptions are reasonable, try to base them as much on well-established knowledge and prior research as possible. - If we think that there is reason to be skeptical of an assumption, we should ask ourselves what evidence would support the assumption and try to *provide* that evidence. --- ## How to get from DGP to causal diagram? **General steps** 1. Make a list of all relevant variables. - A variable on a causal diagram is a measurement we could take that could result in different values. - Every variable that ...causes the treatment or outcome or ...causes something that causes something that causes the treatment or outcome or ...causes something that causes something that causes something is a good candidate for inclusion. 2. Filter the variables to the ones that are (strongly) causing the outcome. --- ## How to get from DGP to causal diagram? **General steps** <ol start=3> <li>Categorise the variables into <b>treatment variables</b> and <b>outcome variables</b> <p>Think about how those are causing each other or they are caused by the treatment or outcome</p> <p>We might also want to consider whether any of the variables are related but neither causes the other, in which case they must have some sort of (unobserved) common cause we can include</p> </li> <li>Add the non-treatment and non-control variables.</li> <li>Create the causal diagram and carefully revise. Is an important variable missing?</li> </ol> --- ## Reducing the complexity of causal diagrams We can identify and reduce needless complexity with some few simple tests: - **Unimportance**: We’ve already discussed this one — if the arrows coming in and out of a variable are likely to be tiny and unimportant effects, we can probably remove the variable. - **Redundancy**: If there are any variables on the diagram that occupy the same space — that is, they have the arrows coming in and going out of them from/to the same variables — we can probably combine them and describe them together. - **Mediators**: If one variable is only on the graph as a way for one variable to affect another, e.g. `\(B\)` in `$$A\to B\to C$$` where nothing else connects to `\(B\)`, then we can probably remove it and just have `\(A\to C\)` directly. - **Irrelevance**: Some variables which are an important part of the DGP but irrelevant to the research question at hand can be neglected. In this case we can get rid of more labels. --- ## Drawing causal diagrams .vcenter[ .blockquote[ ### Example: Online classes Potential causes for people to take online classes (*classes*): - The preferences (*prefs*) of students might be driven by background factors like *race*, *gender*, *age*, and socioeconomic status (*SES*) - Those same background factors might influence how much available time (*time*) students have - *Time* might be influenced by how many work hours (*work*) a student is doing - Causes for people to drop out of community college (*DropOut*) may be *race*, *gender*, *SES*, *work*, and previous performance in academics (*academics*) ]] --- ## Drawing causal diagrams .vcenter[ .blockquote[ ### Example: Online classes Think about which variables cause which others. - *prefs* `\(\to\)` *classes*, *race* `\(\to\)` *DropOut*, *gender* `\(\to\)` *DropOut*, *gender* `\(\to\)` *DropOut*, *WebAccess* `\(\to\)` *classes*, and so on. - If variables are related to each other without there necessarily being a clear causal arrow in either direction, we add on common (unobserved) causes we just call U1, U2, etc. ]] --- ## Drawing causal diagrams .blockquote[ ### Example: Online classes <div class="figure">
<p class="caption">4: Online classes</p> </div> ] --- ## Drawing causal diagrams .vcenter[ .blockquote[ ### Example: Online classes Simplification of the causal diagram: - *gender* and *race* have the exact same set of arrows coming in and going out. So we can combine those in to one, which we can call d*emographics*. - Instead of having *gender*, *race*, *SES*, and *age* affect *prefs* and then have *prefs* affect *classes*, we can just have those variables affect *classes* directly as mediators - We omit *webAccess* using the same logic ]] --- ## Drawing causal diagrams .blockquote[ ### Example: Online classes After simplification, the causal diagram looks like: <div class="figure">
<p class="caption">6: Simplified causal diagram for online classes</p> </div> ] --- ## Drawing causal diagrams **Avoiding cycles** - A causal diagram cannot have a cycles. - A cycle depicts that a variable causes itself—we cannot isolate the cause from effect. - In the true DGP there cannot be any cycles either. - Whenever we have a cycle in our diagram, we can get out of it by adding a **time dimension**. If we just focus on the part of the variable driven by randomness, the effect cannot loop back on itself! --- ## Drawing causal diagrams **Avoiding cycles** .vcenter[ .blockquote[ ### Example: Cycles in a causal diagram <br> .pull-left[ <div class="figure">
<p class="caption">8: A simple loop in a causal diagram</p> </div> ] .pull-right[ <div class="figure">
<p class="caption">10: Another representation of the loop</p> </div> ]]] --- # Getting comfortable with assumptions </br> - Assumptions are important because causal diagram should be based as much as possible on real-world knowledge and prior research, but we cannot possibly know everything about every part of the data generating process. - The quality of our research will hinge on how accurate those assumptions are - For a given assumption we should ask ourselves: - "Is this probably true?" - "What evidence can we provide to push this away from possible and towards probable?" --- # Making assumptions accurate </br> - Think about whether our assumptions are reasonable, try to base them as much on well-established knowledge and prior research as possible, and if we think there is reason to be skeptical of them, ask what evidence would support the assumption and try to provide that evidence. - There are a few other approaches that can be taken: - The first is to get another set of eyes on it since it can be hard to be skeptical of your own assumptions - There are also some more formal tests that can be done. Once we have the diagram written down, it will tell us some relationships that should be zero. And we can check those relationships in our actual data using basic correlations. If they are not zero, something is wrong about our diagram